Search CORE

1,114 research outputs found

GreenPhylDB: A Gene Family Database for plant functional Genomics

Author: Christophe P&#xe9
Marie-Ang&#xe9
Mathieu Rouard
Matthieu G. Conte
Publication venue
Publication date: 22/04/2009
Field of study

With the increasing number of genomes being sequenced, a major objective is to transfer accurate annotation from characterised proteins to uncharacterised sequences. Consequently, comparative genomics has become a usual and efficient strategy in functional genomics. The release of various annotated genomes of plants, such as _O. sativa_ and _A. thaliana_, has allowed setting up comprehensive lists of gene families defined by automated methods. However, like for gene sequence, manual curation of gene families is an important requirement that has to be undertaken. GreenPhylDB comprises protein sequences of 12 plant species fully sequenced that were grouped into homeomorphic families using similarity-based methods. Clusters are finally processed by phylogenetic analysis to infer orthologs and paralogs that will be particularly helpful to study genome evolution. Previously, each cluster has to be curated (i.e. properly named and classified) using different sources of information. A web interface for plant gene families’ curation was developed for that purpose. This interface, accessible on GreenPhylDB ("http://greenphyl.cirad.fr":http://greenphyl.cirad.fr), centralizes external references (e.g. InterPro, KEGG, Swiss-Prot, PIRSF, Pubmed) related to all gene members of the clusters and shows statistics and automatic analysis. We believe that this synthetic view of data available for a gene cluster, combined with basic guidelines, is an efficient way to provide reliable method for gene family annotations

Crossref

CGSpace

Nature Precedings

Phylogenomics of plant genomes: a methodology for genome-wide searches for orthologs in plants

Author: Conte Matthieu G
Droc Gaetan
Gaillard Sylvain
Perin Christophe
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Gene ortholog identification is now a major objective for mining the increasing amount of sequence data generated by complete or partial genome sequencing projects. Comparative and functional genomics urgently need a method for ortholog detection to reduce gene function inference and to aid in the identification of conserved or divergent genetic pathways between several species. As gene functions change during evolution, reconstructing the evolutionary history of genes should be a more accurate way to differentiate orthologs from paralogs. Phylogenomics takes into account phylogenetic information from high-throughput genome annotation and is the most straightforward way to infer orthologs. However, procedures for automatic detection of orthologs are still scarce and suffer from several limitations. Results We developed a procedure for ortholog prediction between <it>Oryza sativa </it>and <it>Arabidopsis thaliana</it>. Firstly, we established an efficient method to cluster <it>A. thaliana </it>and <it>O. sativa </it>full proteomes into gene families. Then, we developed an optimized phylogenomics pipeline for ortholog inference. We validated the full procedure using test sets of orthologs and paralogs to demonstrate that our method outperforms pairwise methods for ortholog predictions. Conclusion Our procedure achieved a high level of accuracy in predicting ortholog and paralog relationships. Phylogenomic predictions for all validated gene families in both species were easily achieved and we can conclude that our methodology outperforms similarly based methods.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

A look at trails through the pangenome visualization jungle

Author: Conte Matthieu
Durant Éloi
Rouard Mathieu
Sabot François
Publication venue
Publication date: 27/07/2021
Field of study

High-throughput sequencing technologies enabled the production of multiple reference genome sequences for a single species. Comparisons of such sequences showed that there are structural variations between individuals from the same species such as Copy Number Variations (CNV) and Presence Absence Variations (PAV) that can have a significant impact on phenotypic variation in plants and could be suitable for breeding improved crop varieties. Thus, a single reference genome is insufficient to capture all variations. Pangenomics is an integrative approach which aims to the assessment of such genomic variations and more within a group of closely related individuals. Its definition can focus on the whole repertoire of genes within a group or can include blocks of genomic sequences shared between species. We introduce here a new visualization tool, based on a linear representation: the PANgenome Analyzer with CHromosomal Exploration (PANACHE). It is a web-based application which enables its users to explore a pangenomic reference divided in multiple panchromosomes

CGSpace

GreenPhylDB v5: a comparative pangenomic database for plant genomes

Author: Conte Matthieu
Droc Gaëtan
Dufayard Jean-François
Guignon Valentin
Rouard Mathieu
Toure Abdel
Publication venue: 'Oxford University Press (OUP)'
Publication date: 11/12/2020
Field of study

Comparative genomics is the analysis of genomic relationships among different species and serves as a significant base for evolutionary and functional genomic studies. GreenPhylDB (https://www.greenphyl.org) is a database designed to facilitate the exploration of gene families and homologous relationships among plant genomes, including staple crops critically important for global food security. GreenPhylDB is available since 2007, after the release of the Arabidopsis thaliana and Oryza sativa genomes and has undergone multiple releases. With the number of plant genomes currently available, it becomes challenging to select a single reference for comparative genomics studies but there is still a lack of databases taking advantage several genomes by species for orthology detection. GreenPhylDBv5 introduces the concept of comparative pangenomics by harnessing multiple genome sequences by species. We created 19 pangenes and processed them with other species still relying on one genome. In total, 46 plant species were considered to build gene families and predict their homologous relationships through phylogenetic-based analyses. In addition, since the previous publication, we rejuvenated the website and included a new set of original tools including protein-domain combination, tree topologies searches and a section for users to store their own results in order to support community curation efforts

CGSpace

Ten simple rules for developing visualization tools in genomics

Author: Cleary Allan
Conte Matthieu
Durant Éloi
Farmer Andrew
Ganko Eric
Muller Cédric
Rouard Mathieu
Sabot François
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 10/11/2022
Field of study

Our following 10 simple rules are dedicated to biologists and bioinformaticians who, while already being at the crossroads of many fields, want to venture further into the land of Data Visualization (“datavis” or “dataviz” for short). They combine tips and advice that we would have wanted when we first started our own journeys, gathered from our experiences in building genomic and/or datavis tools, and the time spent with related communities. Additionally, they address current challenges in computational biology and the needs of the community

PubMed Central

CGSpace

GreenPhylDB v2.0: comparative and functional genomics in plants

Author: Alonso
Altschul
Ashburner
Bailey
Bowman
Cannon
Carbon
Christelle Aluome
Christian M. Zmasek
Christian Walde
Christophe Périn
Conte
Conte
Craigon
De Bodt
Droc
Enright
Fitch
Flavell
Gabaldon
Gagnot
Gaëtan Droc
Guindon
Han
Hruz
Hulo
Hunter
Jaillon
Kanehisa
Katoh
Kaul
Kuzniar
Lawrence
Liolios
Marie-Angélique Laporte
Mathieu Rouard
Matsuzaki
Matthieu G. Conte
Merchant
Ming
Palenik
Paterson
Pei
Rensing
Salse
Schmutz
Schnable
Schneider
Sequencing ProjectInternational Rice Genome
Swarbreck
Tuskan
Valentin Guignon
Van de Peer
Varshney
Vogel
Waterhouse
Yazaki
Zdobnov
Zmasek
Zmasek
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

GreenPhylDB is a database designed for comparative and functional genomics based on complete genomes. Version 2 now contains sixteen full genomes of members of the plantae kingdom, ranging from algae to angiosperms, automatically clustered into gene families. Gene families are manually annotated and then analyzed phylogenetically in order to elucidate orthologous and paralogous relationships. The database offers various lists of gene families including plant, phylum and species specific gene families. For each gene cluster or gene family, easy access to gene composition, protein domains, publications, external links and orthologous gene predictions is provided. Web interfaces have been further developed to improve the navigation through information related to gene families. New analysis tools are also available, such as a gene family ontology browser that facilitates exploration. GreenPhylDB is a component of the South Green Bioinformatics Platform (http://southgreen.cirad.fr/) and is accessible at http://greenphyl.cirad.fr. It enables comparative genomics in a broad taxonomy context to enhance the understanding of evolutionary processes and thus tends to speed up gene discovery

Crossref

PubMed Central

Agritrop

The Generation Challenge Programme comparative plant stress-responsive gene catalogue

Author: Bergmann
Brown
Clamp
Edgar
Glanville
Kimmen Sjolander
Koonin
Krishnamurthy
Manuel Ruiz
Mathieu Rouard
Matthieu Conte
McCarroll
McGinnis
Mungall
Mylah Anacleto
Nandini Krishnamurthy
Ramil Mauleon
Richard M. Bruskiewich
Samart Wanchana
Sjolander
Supat Thongjuea
The UniProt C
Theo van Hintum
Thornton
Victor Jun Ulat
Wilkinson
Zhou
Zmasek
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

The Generation Challenge Programme (GCP; www.generationcp.org) has developed an online resource documenting stress-responsive genes comparatively across plant species. This public resource is a compendium of protein families, phylogenetic trees, multiple sequence alignments (MSA) and associated experimental evidence. The central objective of this resource is to elucidate orthologous and paralogous relationships between plant genes that may be involved in response to environmental stress, mainly abiotic stresses such as water deficit (‘drought’). The web-based graphical user interface (GUI) of the resource includes query and visualization tools that allow diverse searches and browsing of the underlying project database. The web interface can be accessed at http://dayhoff.generationcp.org

Crossref

PubMed Central

Agritrop